{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 基于寒武纪 MLU 的模型训练--YOLOv5目标检测\n",
    "### --PyTorch, Python3, fp32\n",
    "\n",
    "## 目录\n",
    "### 0 基本信息\n",
    "### 1 实验内容及目标\n",
    "     1.1 实验内容\n",
    "     1.2 实验目标\n",
    "### 2 前置知识介绍\n",
    "     2.1 寒武纪软硬件平台\n",
    "     2.2 寒武纪 PyTorch 框架\n",
    "### 3 网络详解\n",
    "     3.1 网络结构\n",
    "### 4 模型训练\n",
    "     4.1 工程目录介绍\n",
    "     4.2 工程准备\n",
    "     4.3 移植修改\n",
    "     4.4 训练\n",
    "     4.5 精度验证\n",
    "### 5 结语\n",
    "     5.1 回顾重点步骤\n",
    "     5.2 相关链接\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 0 基本信息\n",
    "\n",
    "发布者：寒武纪\n",
    "\n",
    "实验时长：150 分钟\n",
    "\n",
    "语言：Python3\n",
    "\n",
    "修改时间：2022-10-12"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1 实验内容及目标\n",
    "## 1.1 实验内容\n",
    "\n",
    "本实验主要介绍基于寒武纪 MLU370 (寒武纪处理器，简称 MLU )与寒武纪 PyTorch 框架的 YOLOv5（v6.0版本）目标检测训练方法。在官方源码的基础上，只需要进行简单移植和修便可在 MLU370 加速训练 YOLOv5 算法，实现目标检测的功能。后续章节将会详细介绍移植过程。\n",
    "\n",
    "\n",
    "## 1.2 实验目标\n",
    "\n",
    "1. 掌握使用寒武纪 MLU370 和 PyTorch 框架进行 AI 模型训练的基本方法。\n",
    "\n",
    "2. 理解 YOLOv5 模型的整体网络结构及其适配流程。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2 前置知识介绍\n",
    "\n",
    "## 2.1 寒武纪软硬件平台介绍\n",
    "\n",
    " &emsp; 硬件：寒武纪 MLU370 AI 计算卡 \n",
    " \n",
    " &emsp; 框架：PyTorch 1.9 \n",
    " \n",
    " &emsp; 系统环境：寒武纪云平台 \n",
    "\n",
    "## 2.2 寒武纪 PyTorch 框架\n",
    "为⽀持寒武纪 MLU 加速卡，寒武纪定制了开源⼈⼯智能编程框架PyTorch（以下简称 Cambricon PyTorch）。    \n",
    "\n",
    "Cambricon PyTorch 借助 PyTorch ⾃⾝提供的设备扩展接⼝将 MLU 后端库中所包含的算⼦操作动态注册到 PyTorch 中，MLU 的后端库可处理 MLU 上的张量和⽹络算⼦的运算。Cambricon PyTorch 会基于 CNNL 库在 MLU 后端实现常⽤⽹络算⼦的计算，并完成数据拷⻉。    \n",
    "\n",
    "Cambricon PyTorch 兼容原⽣ PyTorch 的 Python 编程接⼝和原⽣ PyTorch ⽹络模型，⽀持以在线逐层⽅式进⾏训练和推理。⽹络权重可以从 pth 或 pt 格式⽂件读取，已⽀持的分类和检测⽹络结构由 Torchvision管理，可以从 Torchvision 中读取。对于训练任务，⽀持 float32 及定点量化模型。  \n",
    "\n",
    "获取更多有关Cambricon PyTorch资料，请参考 [寒武纪官网文档](https://developer.cambricon.com/index/document/index/classid/3.html)  PyTorch相关内容。  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3 模型架构\n",
    "\n",
    "## 网络结构\n",
    "\n",
    "YOLOv5针对不同大小（n,s,m,l,x）的网络整体架构不变，但会根据yaml文件中定义的 **depth_mutiple**和**width_mutiple** 参数，对每个子模块进行不同深度和宽度的缩放。\n",
    "\n",
    "以[YOLOv5m](https://github.com/ultralytics/yolov5/blob/v6.1/models/yolov5m.yaml)为例，其网络结构主要由以下几部分组成:\n",
    "\n",
    "```depth_multiple: 0.67 ``` 深度扩充：max(round(n * depth_multiple),1)，其中 **n** 为yaml文件中的 **number** 参数。\n",
    "\n",
    "```width_multiple: 0.75 ``` 宽度（通道）扩充：ceil(width_multiple \\* args\\[0\\] / 8 ) * 8，作用于yaml文件中的**args\\[0\\]**即卷积输出通道数。\n",
    "\n",
    "* Backbone: Conv(CBS), C3, SPPF。 \n",
    "\n",
    "```\n",
    "[from, number, module, args] \n",
    "[[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2 \n",
    "[-1, 1, Conv, [128, 3, 2]],  # 1-P2/4 \n",
    "[-1, 3, C3, [128]],  \n",
    "[-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n",
    "[-1, 6, C3, [256]], \\\n",
    "[-1, 1, Conv, [512, 3, 2]],  # 5-P4/16 \n",
    "[-1, 9, C3, [512]], \\\n",
    "[-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32 \n",
    "[-1, 3, C3, [1024]], \\\n",
    "[-1, 1, SPPF, [1024, 5]],  # 9 \n",
    "] \n",
    "```\n",
    "\n",
    "* Head: Conv(CBS), C3, Upsample, Concat。\n",
    "\n",
    "```\n",
    "[[-1, 1, Conv, [512, 1, 1]], \n",
    "[-1, 1, nn.Upsample, [None, 2, 'nearest']], \n",
    "[[-1, 6], 1, Concat, [1]],  # cat backbone P4 \n",
    "[-1, 3, C3, [512, False]],  # 13 \n",
    "\n",
    "[-1, 1, Conv, [256, 1, 1]], \n",
    "[-1, 1, nn.Upsample, [None, 2, 'nearest']], \n",
    "[[-1, 4], 1, Concat, [1]],  # cat backbone P3 \n",
    "[-1, 3, C3, [256, False]],  # 17 (P3/8-small) \n",
    " \n",
    "[-1, 1, Conv, [256, 3, 2]], \n",
    "[[-1, 14], 1, Concat, [1]],  # cat head P4 \n",
    "[-1, 3, C3, [512, False]],  # 20 (P4/16-medium) \n",
    "\n",
    "[-1, 1, Conv, [512, 3, 2]], \n",
    "[[-1, 10], 1, Concat, [1]],  # cat head P5 \n",
    "[-1, 3, C3, [1024, False]],  # 23 (P5/32-large) \n",
    "\n",
    "[[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5) \n",
    "]\n",
    "```\n",
    "\n",
    "* Detect: Decode, nms, topk(CPU端运行，不参与训练)。\n",
    "\n",
    "网络结构如下图所示:\n",
    "\n",
    "![avatar](./course_images/yolov5m.png)\n",
    "\n",
    " 其中:\n",
    " \n",
    "* CBS: 由 Conv + BN2d + SiLU 三者组成。\n",
    " \n",
    "* C3: [CSP Bottleneck](https://github.com/WongKinYiu/CrossStagePartialNetworks) with 3 convolutions,结构图上图所示。\n",
    " \n",
    "* SPPF: Spatial Pyramid Pooling (Fast)，空间金字塔池化结构，功能不变前提下提升运行速度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4 模型训练\n",
    "\n",
    "## 4.1 工程目录介绍\n",
    "```\n",
    "└── pytorch_yolov5_train\n",
    "    ├── apply_patch.sh           \n",
    "    ├── course_images\n",
    "    │   └── yolov5m.png\n",
    "    ├── pytorch_yolov5_train.ipynb\n",
    "    ├── README.md   \n",
    "    ├── yolov5                        #  YOLOv5 工程，切换到v6.0版本\n",
    "    ├── requirements_mlu.txt\n",
    "    ├── prepare.sh\n",
    "    ├── utils_mlu\n",
    "    │   ├── collect_env.py\n",
    "    │   ├── common_utils.sh\n",
    "    │   ├── configs.json\n",
    "    │   ├── metric.py\n",
    "    │   └── __pycache__\n",
    "    └── yolov5_mlu.patch\n",
    "    ├── yolov5_model\n",
    "        └── yolov5m.yaml\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2 工程准备\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 安装依赖"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install -r ./requirements_mlu.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": true
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!apt update\n",
    "!apt install -y curl "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 模型和数据集下载\n",
    "\n",
    "默认使用 COCO 2017 数据集进行训练，这里数据集存放路径与官方一致，可直接使用官方脚本下载数据集。 \n",
    "```\n",
    "Download COCO 2017 dataset http://cocodataset.org\n",
    "Example usage: bash data/scripts/get_coco.sh \\\n",
    "parent \n",
    "├── projects \n",
    "├── model \n",
    "     └── pretrained  \n",
    "          └── coco  \n",
    "└── dataset \n",
    "     └── private   \n",
    "          └── coco  ← downloads here\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sh prepare.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.3 移植修改"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本实验实在官网原始 YOLOv5 工程的 v6.0 版本下移植修改而得到的。为便于用户移植和修改，我们通过 patch 方式将适配后的训练代码应用于源码。patch代码见 [yolov5_mlu.patch](./yolov5_mlu.patch)  命令如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bash apply_patch.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面，我们将根据**yolov5_mlu.patch**对训练代码适配MLU370过程进行详细解析。\n",
    "\n",
    "* **train.py**首先以train.py进行分析，其patch内容如下：\n",
    "\n",
    "```\n",
    "diff --git a/train.py b/train.py\n",
    "index 29ae43e..e56b1d5 100644\n",
    "--- a/train.py\n",
    "+++ b/train.py\n",
    "@@ -44,18 +44,22 @@ from utils.downloads import attempt_download\n",
    " from utils.loss import ComputeLoss\n",
    " from utils.plots import plot_labels, plot_evolve\n",
    " from utils.torch_utils import EarlyStopping, ModelEMA, de_parallel, intersect_dicts, select_device, \\\n",
    "-    torch_distributed_zero_first\n",
    "+    torch_distributed_zero_first, is_device_available\n",
    " from utils.loggers.wandb.wandb_utils import check_wandb_resume\n",
    " from utils.metrics import fitness\n",
    " from utils.loggers import Loggers\n",
    " from utils.callbacks import Callbacks\n",
    " \n",
    "+# benchmark\n",
    "+cur_dir = os.path.dirname(os.path.abspath(__file__))\n",
    "+sys.path.append(cur_dir + \"/../utils_mlu/\")\n",
    "+from metric import MetricCollector\n",
    "+\n",
    "```\n",
    "\n",
    "从上述代码可知，主要改动为：\n",
    "\n",
    "1. 在utils/torch_utils.py文件中新增**is_device_available**函数,该函数可判断mlu_device是否可获取，如下所示：\n",
    "\n",
    "```\n",
    "diff --git a/utils/torch_utils.py b/utils/torch_utils.py\n",
    "index 352ecf5..fd97b42 100644\n",
    "--- a/utils/torch_utils.py\n",
    "+++ b/utils/torch_utils.py\n",
    "@@ -27,6 +27,10 @@ except ImportError:\n",
    " \n",
    " LOGGER = logging.getLogger(__name__)\n",
    " \n",
    "+is_device_available = {\n",
    "+        'cuda': torch.cuda.is_available(),\n",
    "+        'mlu' : hasattr(torch, 'is_mlu_available') and torch.is_mlu_available()\n",
    "+    }\n",
    "```\n",
    "2. 新增导入utils_mlu目录，主要用于benchmark性能数据测试收集；\n",
    "\n",
    "---\n",
    "```\n",
    "@@ -97,7 +101,7 @@ def train(hyp,  # path/to/hyp.yaml or hyp dictionary\n",
    "\n",
    "     # Config\n",
    "     plots = not evolve  # create plots\n",
    "-    cuda = device.type != 'cpu'\n",
    "+    use_device = device.type in ['cuda', 'mlu']\n",
    "     init_seeds(1 + RANK)\n",
    "     with torch_distributed_zero_first(LOCAL_RANK):\n",
    "         data_dict = data_dict or check_dataset(data)  # check if None\n",
    "@@ -113,7 +117,7 @@ def train(hyp,  # path/to/hyp.yaml or hyp dictionary\n",
    "     if pretrained:\n",
    "         with torch_distributed_zero_first(LOCAL_RANK):\n",
    "             weights = attempt_download(weights)  # download if not found locally\n",
    "-        ckpt = torch.load(weights, map_location=device)  # load checkpoint\n",
    "+        ckpt = torch.load(weights, map_location='cpu')  # load checkpoint\n",
    "         model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)  # create\n",
    "         exclude = ['anchor'] if (cfg or hyp.get('anchors')) and not resume else []  # exclude keys\n",
    "         csd = ckpt['model'].float().state_dict()  # checkpoint state_dict as FP32\n",
    "```\n",
    "解析：\n",
    "1. ```device.type```支持了‘mlu’选项，表示用mlu进行训练；\n",
    "2. 运行到pertrained分支，即采用预训练模型时，此时load权重文件要从cpu端进行获取，即```map_location='cpu'```\n",
    "---\n",
    "```\n",
    "@@ -196,13 +200,14 @@ def train(hyp,  # path/to/hyp.yaml or hyp dictionary\n",
    "     imgsz = check_img_size(opt.imgsz, gs, floor=gs * 2)  # verify imgsz is gs-multiple\n",
    "\n",
    "     # DP mode\n",
    "-    if cuda and RANK == -1 and torch.cuda.device_count() > 1:\n",
    "+    cuda = (device.type=='cuda')\n",
    "+    if cuda and RANK == -1 and torch.cuda.device_count()  > 1:\n",
    "         logging.warning('DP not recommended, instead use torch.distributed.run for best DDP Multi-GPU results.\\n'\n",
    "                         'See Multi-GPU Tutorial at https://github.com/ultralytics/yolov5/issues/475 to get started.')\n",
    "         model = torch.nn.DataParallel(model)\n",
    "\n",
    "     # SyncBatchNorm\n",
    "-    if opt.sync_bn and cuda and RANK != -1:\n",
    "+    if opt.sync_bn and use_device and RANK != -1:\n",
    "         model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model).to(device)\n",
    "         LOGGER.info('Using SyncBatchNorm()')\n",
    "\n",
    "@@ -238,7 +243,7 @@ def train(hyp,  # path/to/hyp.yaml or hyp dictionary\n",
    "         callbacks.run('on_pretrain_routine_end')\n",
    "\n",
    "     # DDP mode\n",
    "-    if cuda and RANK != -1:\n",
    "+    if use_device and RANK != -1:\n",
    "         model = DDP(model, device_ids=[LOCAL_RANK], output_device=LOCAL_RANK)\n",
    "```\n",
    "\n",
    "解析：上述修改涉及于分布式训练，这里暂不对此进行介绍。详细内容可以参考请参考[寒武纪官网文档](https://developer.cambricon.com/index/document/index/classid/3.html) PyTorch相关内容。\n",
    "\n",
    "---\n",
    "```\n",
    "@@ -259,13 +264,25 @@ def train(hyp,  # path/to/hyp.yaml or hyp dictionary\n",
    "     maps = np.zeros(nc)  # mAP per class\n",
    "     results = (0, 0, 0, 0, 0, 0, 0)  # P, R, mAP@.5, mAP@.5-.95, val_loss(box, obj, cls)\n",
    "     scheduler.last_epoch = start_epoch - 1  # do not move\n",
    "-    scaler = amp.GradScaler(enabled=cuda)\n",
    "+    use_amp = (device.type == 'cuda' or opt.pyamp)\n",
    "+    scaler = amp.GradScaler(enabled=use_amp)\n",
    "+\n",
    "     stopper = EarlyStopping(patience=opt.patience)\n",
    "     compute_loss = ComputeLoss(model)  # init loss class\n",
    "     LOGGER.info(f'Image sizes {imgsz} train, {imgsz} val\\n'\n",
    "                 f'Using {train_loader.num_workers} dataloader workers\\n'\n",
    "                 f\"Logging results to {colorstr('bold', save_dir)}\\n\"\n",
    "                 f'Starting training for {epochs} epochs...')\n",
    "+\n",
    "+    metric_collector = MetricCollector(\n",
    "+        enable_only_benchmark=True,\n",
    "+        record_elapsed_time=True,\n",
    "+        record_hardware_time=True if opt.device == 'mlu' else False)\n",
    "+    if opt.iters != -1:\n",
    "+        epochs = start_epoch + math.ceil(opt.iters*1.0 / len(train_loader))\n",
    "+        iters = opt.iters\n",
    "+    metric_collector.place()\n",
    "+\n",
    "     for epoch in range(start_epoch, epochs):  # epoch ------------------------------------------------------------------\n",
    "         model.train()\n",
    "```\n",
    "解析：\n",
    "1. 添加MLU对自动混合精度（Automatic mixed precision，AMP）训练。\n",
    "\n",
    "---\n",
    "```\n",
    "@@ -449,7 +475,7 @@ def parse_opt(known=False):\n",
    "     parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')\n",
    "     parser.add_argument('--cache', type=str, nargs='?', const='ram', help='--cache images in \"ram\" (default) or \"disk\"')\n",
    "     parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')\n",
    "-    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n",
    "+    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu; mlu')\n",
    "     parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')\n",
    "     parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')\n",
    "     parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer')\n",
    "@@ -465,6 +491,9 @@ def parse_opt(known=False):\n",
    "     parser.add_argument('--freeze', type=int, default=0, help='Number of layers to freeze. backbone=10, all=24')\n",
    "     parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')\n",
    "     parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')\n",
    "+    parser.add_argument('--pyamp', action='store_true', help='using amp for mixed precision training')\n",
    "+    parser.add_argument('--iters', type=int, default=-1, help=\"Total iters for benchmark.\")\n",
    "+    parser.add_argument('--skip', action='store_true', help='skip val or save pt.')\n",
    "```\n",
    "解析：\n",
    "1. ```--device``` 运行参数增添‘mlu’选项。\n",
    "2. ```--pyamp``` MLU训练支持自动混合精度训练，新增此参数设置。\n",
    "\n",
    "---\n",
    "```\n",
    "@@ -501,16 +530,28 @@ def main(opt, callbacks=Callbacks()):\n",
    "             opt.exist_ok, opt.resume = opt.resume, False  # pass resume to exist_ok and disable resume\n",
    "         opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))\n",
    "\n",
    "+    if opt.device == 'mlu':\n",
    "+        print(is_device_available)\n",
    "+        assert is_device_available['mlu'], \"\"\n",
    "+        device = torch.device('mlu')\n",
    "+    else:\n",
    "+        device = select_device(opt.device, batch_size=opt.batch_size)\n",
    "```\n",
    "解析：\n",
    "1. 该步骤较为重要，表示当运行device选取为'mlu'时的配置选项。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.4 训练\n",
    "\n",
    "这里提供了三种运行方式：\n",
    "\n",
    "1. Training: 基于数据从头开始训练；\n",
    "2. From pretrained training：基于原始代码权重文件进行训练；\n",
    "3. Resume Training：在上次训练基础上继续训练。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Train YOLOv5m on COCO for 1 epochs in mlu device\n",
    "!cd yolov5 && python train.py --img 640 --batch 28 --epochs 1 --data  coco.yaml --weights \"\" --cfg yolov5m.yaml --device mlu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. From pretrained training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!cd yolov5 && python train.py --batch 28 --epochs 1 --data coco.yaml --cfg yolov5m.yaml  --device mlu  --weights ../yolov5_model/yolov5m.pt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Resume Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "!cd yolov5 && python train.py --resume --batch 28 --data coco.yaml --cfg yolov5m.yaml  --device mlu  --weights ./runs/train/exp3/weights/last.pt "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注意根据最终训练版本修改 --weights 的参数。本实验由于时间关系，故 Resume Training 注释方式提供\n",
    "\n",
    "**参数说明：**\n",
    "\n",
    "* imgsz: 训练、验证时图像大小，默认640；\n",
    "* epoch: 训练迭代次数；\n",
    "* batch：单个批次大小，若是运行中提示内存错误，可以考虑减少该数值；\n",
    "* data: dataset.yaml 文件路径；\n",
    "* weights: 初始化权重路径，若提示“No such file or directory”,注意查询对应文件夹是否有相应权重文件；\n",
    "* cfg: model.yaml 路径；\n",
    "* device: 运行设备选取，如 mlu，cuda 或 cpu； \n",
    "* resume: 在最近训练结果继续训练；\n",
    "* 运行命令中可添加 `--pyamp` 参数进行自动混合精度训练；\n",
    "* 更多参数设置及解析在 train.py 文件 parse_opt 函数中查看；\n",
    "* 超参可在```/yolov5/data/hyps/typ.scratch.yaml```中设置。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.5 精度验证"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "!cd yolov5 && python val.py --data coco.yaml --conf 0.001 --iou 0.65 --weight runs/train/exp4/weights/best.pt --device mlu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为该文档和代码主要教大家如何在寒武纪MLU上运行 YOLOv5 的训练，其中迭代次数epoch设置较小，故精度验证以注释方式提供(若需要运行，建议测试 From pretrained training 保存的效果，并将上行代码改成 code 的形式)，只是验证该命令可用，不作实际精度测试。测试时注意根据上述训练权重保存情况修改 weight 的参数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5 结语\n",
    "\n",
    "从上述适配流程可知，采用 MLU370 进行 AI 模型训练流程与 GPU 使用较为一致，方便用户学习与扩展，极大降低了模型迁移成本和学习时间。同时采用 MLU370 训练能够加速模型的训练速度，与寒武纪MagicMind推理平台相配合使用，丰富训推一体平台的功能完善，让机器更好地理解与服务人类。  \n",
    "\n",
    "## 5.1 回顾重点步骤\n",
    "至此，基于寒武纪 MLU370 与 PyTorch 框架下的 YOLOv5 训练实验已经完毕。让我们回顾一下在使用寒武纪 MLU370 与 PyTorch 框架下，都有哪些主要开发步骤：\n",
    "1. 新增MLU device支持，将模型与数据使用MLU进行训练。\n",
    "2. 各种训练方式的使用，如采用预训练模型、finetune、resume以及自动混合精度的训练设置。\n",
    "3. 使用MLU进行精度验证。\n",
    "\n",
    "## 5.2 相关链接  \n",
    "\n",
    "1. 对上述代码有疑问请提交ISSUE:  \n",
    "https://gitee.com/cambricon/practices/issues  \n",
    "\n",
    "2. 更多与寒武纪开发相关的有趣内容请移步至寒武纪开发者社区：    \n",
    "https://developer.cambricon.com/\n",
    "\n",
    "3. 如果有任何其他问题可到寒武纪开发者论坛提问，会有专人为您解答：  \n",
    "https://forum.cambricon.com//list-1-1.html\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
